Analysis and Prediction of Unalignable Words in Parallel Text

نویسندگان

  • Frances Yung
  • Kevin Duh
  • Yuji Matsumoto
چکیده

Professional human translators usually do not employ the concept of word alignments, producing translations ‘sense-forsense’ instead of ‘word-for-word’. This suggests that unalignable words may be prevalent in the parallel text used for machine translation (MT). We analyze this phenomenon in-depth for Chinese-English translation. We further propose a simple and effective method to improve automatic word alignment by pre-removing unalignable words, and show improvements on hierarchical MT systems in both translation directions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Application of Wavelet Neural Network in Forward Kinematics Solution of 6-RSU Co-axial Parallel Mechanism Based on Final Prediction Error

Application of artificial neural network (ANN) in forward kinematic solution (FKS) of a novel co-axial parallel mechanism with six degrees of freedom (6-DOF) is addressed in Current work. The mechanism is known as six revolute-spherical-universal (RSU) and constructed by 6-RSU co-axial kinematic chains in parallel form. First, applying geometrical analysis and vectorial principles the kinematic...

متن کامل

Mining of Emerging trends of Covid-19 thematic areas in National and International publications

Background &Aim: The results from the analysis of COVID-19 literature by employing text-mining techniques are of particular importance for researchers, policymakers, and planners of medical sciences at the national and international levels, avoiding parallel research and waste of time and budget. The paper explore emerging topics and the trend of scientific words at the national and internation...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

A Linguistic Analysis of Conference Titles in Applied Linguistics

Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...

متن کامل

A Linguistic Analysis of Conference Titles in Applied Linguistics

Over the past twenty-five years, researchers have expressed considerable interest in titles of academic publications. Unfortunately, conference paper titles (CPTs) have only recently begun to receive attention. The aim of this study, therefore, is to investigate the text length, syntactic structure, and lexicon of CPTs in Applied Linguistics. A data set of 698 titles was selected from the 2008 ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014